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Abstract 

Background: Glyceraldehyde-3-phosphate dehydrogenase (GAPD) catalyses one of the glycolytic reactions and is 
also involved in a number of non-glycolytic processes, such as endocytosis, DNA excision repair, and induction of 
apoptosis. IVlammals are known to possess two homologous GAPD isoenzymes: GAPD-1, a welhstudied protein 
found in all somatic cells, and GAPD-2, which is expressed solely in testis. GAPD-2 supplies energy required for the 
movement of spermatozoa and is tightly bound to the sperm tail cytoskeleton by the additional N-terminal 
proline-rich domain absent in GAPD-1. In this study we investigate the evolutionary history of GAPD and gain 
some insights into specialization of GAPD-2 as a testis-specific protein. 

Results: A dataset of GAPD sequences was assembled from public databases and used for phylogeny 
reconstruction by means of the Bayesian method. Since resolution in some clades of the obtained tree was too 
low, syntenic analysis was carried out to define the evolutionary history of GAPD more precisely. The performed 
selection tests showed that selective pressure varies across lineages and isoenzymes, as well as across different 
regions of the same sequences. 

Conclusions: The obtained results suggest that GAPD-1 and GAPD-2 emerged after duplication during the early 
evolution of chordates. GAPD-2 was subsequently lost by most lineages except lizards, mammals, as well as 
cartilaginous and bony fishes. In reptilians and mammals, GAPD-2 specialized to a testis-specific protein and 
acquired the novel N-terminal proline-rich domain anchoring the protein in the sperm tail cytoskeleton. This 
domain is likely to have originated by exonization of a microsatellite genomic region. Recognition of the proline- 
rich domain by cytoskeletal proteins seems to be unspecific. Besides testis, GAPD-2 of lizards was also found in 
some regenerating tissues, but it lacks the proline-rich domain due to tissue-specific alternative splicing. 



Background 

Glyceraldehyde-3-phosphate dehydrogenase (GAPD, EC 
1.2.1.12) is a homotetrameric glycolytic enzyme provid- 
ing phosphorylation of 3-phosphoglyceraldehyde to 1,3- 
diphosphoglycerate coupled with reduction of NAD* to 
NADH. Mammals are known to possess two tissue-spe- 
cific GAPD isoenzymes: somatic (GAPD-1) and testis- 
specific (GAPD-2, GAPDS). For Homo sapiens, their 
protein sequences are 68% identical. Besides the two iso- 
enzymes, a vast amount of GAPD pseudogenes was 
found in the genomes of primates and rodents [1,2]. 
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Mammalian GAPD-1 is a well-studied protein, a high 
concentration of which in cells (5-15% of all cytoplasmic 
proteins) confirms its functional significance. Recent 
studies established that GAPD-1 is not simply a classical 
metabolic protein involved in glycolytic energy produc- 
tion, but rather a multifunctional protein with specific 
functions in numerous processes [3,4]. GAPD-1 was 
shown to display both cytosolic and nuclear localization 
participating in endocytosis [5-7], plasma membrane 
fusion [8], microtubule assembly [9,10], secretory vesicu- 
lar transport [11,12], protein phosphotransferase/kinase 
reactions [13,14], translational and transcriptional con- 
trols of gene expression [15-17], regulation of telomere 
structure [18,19], nuclear membrane fusion [20], nuclear 
RNA transport [21], DNA excision-repair [22,23] and 
induction of apoptosis in case of oxidative stress 
[24-27]. Furthermore, GAPD-1 was implicated in 
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Alzheimer's [28-30] and Huntington's [30-32] neurode- 
generative diseases. 

As opposed to soluble GAPD-1, mammalian GAPD-2 
is tightly attached to the cytoskeleton, namely to the 
principal piece of the spermatic filament fibrous sheath 
[33-35]. The attachment is mediated by an additional 
N-terminal proline-rich domain of 74 amino acids 
[35,36]. GAPD-2 supplies the dynein ATPases of fila- 
ment with energy, therefore playing a crucial role in 
the maintaining of sperm motility. Disruption of its 
expression generally leads to infertility [37]. Due to its 
strong association with cytoskeleton GAPD-2 remains 
within the insoluble fraction after cell breaking, signifi- 
cantly complicating its experimental investigation. As a 
result, there is only little data on GAPD-2 properties. 
It was recently discovered to display enhanced stability 
towards denaturation that may be an adaptation to the 
absence of protein expression in spermatozoa. Enzyme 
kinetics exhibited by GAPD-2 was found to differ from 
the one exhibited by GAPD-1 too [38]. Based on the 
study of short functional motives of both mammalian 
isoenzymes, GAPD-2 was proposed to evade involve- 
ment in most non-glycolytic processes characteristic 
for GAPD-1 [39]. 

GAPD-1 and GAPD-2 are also possessed by some 
other vertebrates besides Mammalia [40-42], but their 
expression is apparently not always tissue-specific. In 
the bony fish Oplegnathus fasciatus both GAPD mRNAs 
were detected ubiquitously in all of tissues examined 
[40], and therefore the functional specificities of the iso- 
enzymes seem to differ from the mammalian ones. 
Based on the phylogenetic trees, it was hypothesized 
that GAPD could diverge to the isoenzymes around the 
origin of Bilateria, but as only vertebrates have retained 
GAPD-2, this scenario seems unlikely. However, some 
vertebrates (e.g. Xenopus laevis) were discovered to lack 
GAPD-2 [42]. 

Single copy genes are thought to evolve conservatively 
because of strong negative selective pressure. Gene 
duplications produce a redundant gene copy and thus 
release one or both copies from negative selective pres- 
sure. Thus, duplications should be an important precur- 
sor of functional divergence. The increased availability 
of sequences in the public databases allows the investi- 
gation of the molecular evolution of the GAPD gene 
family and the evaluation of selection following duplica- 
tion events. In the present study we focus on the evolu- 
tion of the poorly uninvestigated GAPD-2 isoenzyme. 
Previously GAPD-2 was discovered to be specific for 
vertebrates [42]. Therefore we will focus on this taxon 
as well as on the other groups of deuterostomes not 
considered in [42]. Specifically, we (1) examine the evo- 
lutionary history of GAPD-2 and other GAPD isoen- 
zymes of deuterostomes, (2) evaluate lineage-specific 



changes in selective pressure affecting GAPD isoen- 
zymes, and (3) look into the metamorphosis of GAPD-2 
to a testis-specific protein. 

Results 

Sequences of GAPD family members 

The numbers of discovered GAPD family members for 
all examined species are represented at Figure 1. 
Mammalian GAPD sequences were extracted from the 
Ensembl database. For most species (19 of the 25 exam- 
ined) two different sequences were obtained. One of 
these sequences always contained an additional proline- 
rich domain at the N-terminus, as observed in the 
human GAPD-2. A single GAPD sequence was obtained 
for each of the 6 remaining mammalian species, either 
with or without the proline-rich domain. The lack of 
the second sequence seems to be due to incompleteness 
of genomes. 

GAPD sequences of teleosts were obtained using both 
Ensembl (5 species present in this database) and BLAST 
searches against RefSeq transcripts and the EST division 
of GenBank (species not covered by Ensembl). Three 
different sequences were discovered for 4 species, two 
sequences - for 6 species and a single sequence - for 3 
species. The differences between the numbers of 
obtained GAPD sequences are not necessarily a result of 
data incompleteness and may be biologically relevant. 
For example, only two sequences were identified within 
a complete genome of zebrafish. 

Identification of GAPD sequences of all other species 
was performed by conducting BLAST searches against 
RefSeq transcripts and the EST division of GenBank. 
Two different GAPD sequences were discovered for 
lizards, some cartilaginous fishes, some jawless verte- 
brates, some tunicates and a few non-deuterostomes (3 
of 10 insects, a leech and a flatworm). Single GAPD 
sequences were discovered for all examined birds, reptiles 
except lizards, amphibians, lancelet, echinoderms, acorn 
worm, Xenoturbella bocki, as well as for the remaining 
cartilaginous fishes, jawless vertebrates, tunicates and 
most examined non-deuterostomes. Two species {Xeno- 
pus laevis and Ciona savignyi) were revealed to possess 
even three, but slightly different GAPD family members. 

Tissue-specific translation of the proline-rich domain In 
lizard GAPD 

Besides mammalian GAPD-2, proline-rich domains were 
detected only in one of the GAPD isoenzymes of lizard 
species: Anolis carolinensis and Gekko gecko. ESTs of A. 
carolinensis encoding this isoenzyme originated from tes- 
tis [GenBank:FG786985, GenBank:FG793471, GenBank: 
FG801901, GenBank:FG802958], regenerating tail [Gen- 
Bank:FG771974, GenBank:FG779496] and the whole 
embryo [GenBank:FG720854]. It is remarkable that ESTs 
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Figure 1 Numbers of predicted GAPD genes for the examined species. Yellow color corresponds to species with a single predicted gene, 
blue color - with two genes and red - with three genes. Taxonomy was obtained from NCBI Taxonomy database [1 19]; the figure was prepared 
with ITOL [120]. 



from regenerating tail and embryo lack a fragment of 103 
nucleotides (shortened variant), which in present in ESTs 
from testis (full-length variant; see Figure 2 and addi- 
tional file 1: Alignment of the two forms of Anolis caroli- 
nensis GAPD-2 mRNA). This fragment is situated near 
the 5'-terminus and encodes the beginning of the pro- 
line-rich domain including an ATG start codon. The 
next possible start codon, which is present in both EST 



variants, is located right after the proline-rich domain. So 
the protein with the proline-rich domain should be trans- 
lated only from the full-length EST variant. Translation 
of the shortened EST variant should begin from the sec- 
ond start codon such that the product will not possess 
the proline-rich domain. 

Availability of the two EST variants must be a result of 
tissue-specific alternative splicing: the exon of 103 
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Translation Translation 

initiation site #1 initiation site #2 

(in testis) (in regenerating tissues) 




Translation 
termination site 



full-length 
shortened 



full-length 
shortened 



Exon 1 



jgttccgagccgttcggtcaatcgtttgggaggaagtctctc 
jgttccgagccgttcggtcaatcgtttgggaggaagtctctca 

Translation initiation site #1 

Exon 2 ^^^^^^^^^^^^^^^ 



Dactat 

f 



Exon 2 ' 



tttctaggtagagaaacggtagataggtcagcca 



Exon 3 ■ 



^ 1 



|aatccacgcggtaacagtcaaccaatagagagcacatcagttagtgtcaaagttttcac|gcttcagatcagt 

cttcagatcagt 

\NPRGNSQPIESTSVSVKVFRtLQIS 



Exon 3 ^ 

full-length : gaaggctctcctccaaagatctcccctcagcctgcaccagagcctgaaccagagccagaaccacaaccacctaccccaga 
shortened : gaaggctctcctccaaagatctcccctcagcctgcaccagagcctgaaccagagccagaaccacaaccacctaccccaga 
EGSPPKISPQPAPEPEPEPEPQPPTPE 

Translation initiation site #2 

^ Exon 3 I — ► 

full-length : gcctgaaccagaagctcctccaccacctccgccacctccacctcctccaccacctccaaaacgc^Hgctgaattcgcgg 
shortened : gcctgaaccagaagctcctccaccacctccgccacctccacctcctccaccacctccaaaacgc^Hgctgaattcgcgg 

iPEPEAPPPPPPPPPPPPPPKR E F A V 

Figure 2 Alternative splicing in GAPD-2 of Anolis carolinensis Alternative splicing seems to govern the proline-rich domain presence in 
GAPD-2 of a lizard Anolis carolinensis. If the second exon is spliced, the protein product will lack the proline-rich domain, otherwise it will 
possess this domain. A) Map of GAPD-2 gene constructed based on both EnsembI and EST data. Exons are in yellow, introns and not-transcribed 
regions are in blue. The positions of the two possible translation initiation sites are marked, as well the position of the translation termination 
site. B) Alignment of the 5'-termini of full-length and shortened (lacking the second exon) mRNAs. The sequences of possible protein products 
are also represented. 



nucleotides was either preserved, as in gonads, or spliced 
out, as in embryo and regenerating tail. Thus, the presence 
of the proline-rich domain has a tissue-specific character. 

A few ESTs of G. gecko were extracted from samples 
of injured brain and spinal cord [GenBank:EB170778, 
GenBank:CV053413] and had incomplete 5'-termini: 
only a part of the sequence encoding the proline-rich 
domain was present. Therefore it is impossible to ascer- 
tain whether the translation of the proline-rich domain 
in G. gecko is governed by alternative splicing like in A. 
carolinensis. 

Phylogeny and syntenic analyses 

Analysis of the orthologous and paralogous relationships 
of GAPD isoenzymes among different species was carried 



out by combining the phylogeny reconstruction of the 
GAPD gene family with syntenic comparison. The phylo- 
genetic tree constructed from amino-acid sequences 
demonstrated poor correspondence to the common 
knowledge about the evolution of deuterostomes, prob- 
ably due to high sequence conservation (only 48 of 335 
residues are different between GAPD-1 of human and its 
ortholog in zebrafish). Therefore we decided to switch to 
nucleic sequences which are less conserved. Indeed, the 
obtained phylogenetic tree (Figure 3) showed better cor- 
respondence to the common evolutionary knowledge, 
but still was far from perfect. For example, tunicates were 
closer to mammals than fishes. 

All GAPD isoenzymes of vertebrates can be subdivided 
into two groups based on the clades of phylogenetic tree: 
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Mus musculus GAPD-1 
-Homo sapiens GAPD-1 
■Canis familiaris GAPD-1 
-Bos taurus GAPD-1 
Procavia capensis GAPD-1 
-Cavia porcellus GAPD-1 

Monodelphis domestica GAPD-1 



100 r-Columbia livia GAPD 

— nn \~\__rTaeniopygia guttata GAPD 
""I ^Passer domesticus GAPD 
-Coturnix coturnix GAPD 
Meleagris gallopavo GAPD 
Gallus gallus GAPD 
Alligator mississippiensis GAPD 
Peiodiscus sinensis GAPD 
Trachemis scripta GAPD 
—Gekl<o gecko GAPD-1 
■Anolis carolinensis GAPD-1 

J Deinagl<istrodon acutus GAPD 

1 — Micrurus corallinus GAPD 
Rana ridibunda GAPD 




Xenopus laevis GAPD isoform B 
■Xenopus laevis GAPD isoform A 
■Xenopus laevis GAPD isoform C 

lOO i Eptatretus burgeri GAPD isoform B 

— Eptatretus burgeri GAPD isoform A 

— Molgula tectiformis GAPD isoform A 
-Halocynthia roretzi GAPD isoform A 



-Herdmania curvata GAPD 



Ambistoma mexicanum GAPD 

Cynops pyrrhogaster GAPD 

— Notophthalmus viridescens GAPD 
■Acipenser sinensis GAPD 
Salmo salar GAPD-3 
Esox lucius GAPD-3 
Gasterosteus aculeatus GAPD-3 
Gadus morhua GAPD-3 
Gadus morhua GAPD-1 
Salmo salar GAPD-1 
Oryzias latipes GAPD-1 
Paralichthys olivaceus GAPD-1 
Gasterosteus aculeatus GAPD-1 

■Danio rerio GAPD-1 
Tetraodon nigroviridis GAPD-1 
Squalus acanthias GAPD-1 

Leucoraja erinacea GAPD-1 



-Ciona intestinalis GAPD 

Ciona savignyi GAPD isoform A 

r— Ciona savignyi GAPD isoform B 

t— Ciona savignyi GAPD isoform C 



Torpedo californlca GAPD 
Petromyzon marinus GAPD isoform B 
r-Lethenteron japonicum GAPD 
i-Petromyzon marinus GAPD isoform A 
Branchlostoma floridae GAPD 

Xenoturbella bocki GAPD 

, Paracentrotus llvidus GAPD 



-Strongylocentrotus purpuratus GAPD 

Saccoglossus GAPD 

Patlria pectlnlfera GAPD 

I Molgula tectiformis GAPD isoform B 

I Asterlas rubens GAPD 

Mus musculus GAPD-2 

Cavia porcellus GAPD-2 
■Homo sapiens GAPD-2 

■Procavia capensis GAPD-2 
Canis familiaris GAPD-2 
■Bos taurus GAPD-2 

Monodelphis domestica GAPD-2 



Gekko gecko GAPD-2 

■Anolls carolinensis GAPD-2 
Triakis scylllum GAPD 
.Leucoraja erinacea GAPD-2 
iSqualus acanthias GAPD-2 
Salmo salar GAPD-2 
■Danlo rerlo GAPD-2 
Gadus morhua GAPD-2 
Tetraodon nigroviridis GAPD-2 
Gasterosteus aculeatus GAPD-2 
Paralichthys olivaceus GAPD-2 
Oryzias latipes GAPD-2 
Anopheles gamblae GAPD 

■Drosophila melanogaster GAPD isoform A 



Mammals, GAPD-1 
Birds 

Reptiles, GAPD-1 

Frogs 
Hagfishes 

Tunicates 

Salamanders 

Teleosts, GAPD-3 

Teleosts, GAPD-1 



Cartilaginous 
fishes, GAPD-1 

Lampreys 



Echinoderms 



Mammals, GAPD-2 



Cartilaginous 
fishes, GAPD-2 

Teleosts, GAPD-2 



■Nasonia vitrlpennis GAPD 

■Nilaparvata lugens GAPD isoform A 
■Homalodlsca coagulata GAPD 



- Nilaparvata lugens GAPD isoform B 
-Drosophila melanogaster GAPD isoform B 



-Macrostomum lignano GAPD isoform A 
—Macrostomum lignano GAPD isoform B 



Halocynthia roretzi GAPD isoform B 

-Pedlculus humanis GAPD 



Arthropods 
Flatworms (outgroup) 

Figure 3 Phylogenetic tree of 92 GAPD isoenzymes. Phylogenetic tree constructed on nucleotide sequences using the Bayesian algorithm. 
Numbers at nodes are the obtained posterior probabilities. Discontinuous lines mark the branches of enormous high length (more than 0.75), 
which can correspond to pseudogenes or contaminated samples. The tree does not fit the common knowledge about the evolution in details; 
nevertheless it provides some useful information. For more accurate definition of GAPD evolution, syntenic analysis was also used. 
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the group including mammalian GAPD-1 and the group 
including mammalian GAPD-2. GAPD of insects sepa- 
rates before these two groups diverge, which means that 
the duplication into GAPD-1 and GAPD-2 took place 
after the divergence of protostomes and deuterostomes. 
The orthologs of mammalian GAPD-1 and GAPD-2 are 
further referred to as GAPD-1 and GAPD-2, 
correspondingly. 

The clade including mammalian GAPD-1 is supported 
by a high posterior probability (100%). Inside this clade 
a number of additional duplications were detected. One 
of them apparently happened near the origin of teleosts 
and produced a third GAPD isoenzyme hereinafter 
referred to as GAPD-3. Other independent duplications 
produced additional GAPD isoenzymes in lamprey, hag- 
fish, sea squirt and Xenopus laevis. 

The clade including mammalian GAPD-2 is based on 
a less robust branch with a posterior probability of 77%. 
It splits into a clade of vertebrates (100% posterior prob- 
ability) and a clade including the only GAPD isoen- 
zymes of echinoderms, lancelet, hemichordates and 
Xenoturbella bocki, as well as the second GAPD isoen- 
zyme of some tunicates (77% posterior probability). On 
account of lower support value, merging of these two 
clades into one is questionable and needs confirmation. 

The syntenic analysis showed that GAPD family mem- 
bers of the examined species can be linked to either of 
two loci: the locus syntenic to human GAPD-1 contains 
GAPD-1 of zebrafish, GAPD-1 and GAPD-3 of stickle- 
back, the only GAPD of lancelet and the only GAPD of 
sea squirt; the locus syntenic to human GAPD-2 con- 
tains GAPD-2 of both zebrafish and stickleback (Figure 
4). The similarity between gene layouts within both loci 
is rather low, multiple genome micro-rearrangement 
events such as deletions and inversions were detected. 
The surroundings of GAPD genes in sea urchin and 
acorn worm genomes do not contain any common 
genes with both each other and the revealed two synte- 
nic loci. The genes in these surroundings do not form 
any clusters in the genomes of other examined species 
as well. This can be accounted for distant relationships 
between the species. 

BLAST searches of genes which are syntenic to 
human and fish GAPD-2 were carried out in the gen- 
omes of lancelet, sea squirt, sea urchin and acorn worm. 
They showed that these genes are dispersed in the gen- 
omes rather than combined together in a single locus. 

The constructed synteny maps provide support for 
orthology between GAPD-1 of human, either GAPD-1 
or GAPD-3 of stickleback, GAPD of lancelet and sea 
squirt, as well as between GAPD-2 of human and both 
fishes. These results generally agree with the phyloge- 
netic trees, indicating orthology between appropriate 
isoenzymes of human and fishes. Syntenic analysis 



helped to identify the origin of lancelet and sea squirt 
GAPDs, which was not determined with confidence by 
phylogenetic trees construction because of low branch 
support values. The evidence is also given for the origi- 
nation of GAPD-1 and GAPD-3 of stickleback and 
probably some other bony fishes as a result of teleost- 
specific whole genome duplication. 

Selective pressure estimation 

Ka/Ks profiles were compared in four clades: mamma- 
lian GAPD-1 and GAPD-2, teleost GAPD-1 and GAPD- 
2, while GAPD of insects was used as an outgroup. To 
avoid saturation in synonymous substitutions which can 
significantly affect the results, pairs of closely related 
sequences were considered (Table 1). 

Results of Ka/Ks profile calculation show that selective 
pressure varies for different regions of GAPD sequences 
(Figure 5). Most regions of all examined sequences are 
suggested to be under strong purifying selection (Ka/Ks 
< 0.1). However, a part of the proline-rich domain of 
mammalian GAPD-2 is not restrained by purifying 
selection with Kg/Ks up to 1.1. 

In mammaUan GAPD-1 and GAPD-2, teleost GAPD-2 
and insect GAPD, the purifying selection is impaired 
approximately between the 85th and 105th positions of 
protein sequences (Section 1; from here on the number- 
ing of amino acid positions as in mammalian GAPD-1). 
In mammalian GAPD-2 purifying selection is also wea- 
kened between the 265th and 285th positions (Section 
2). In teleost GAPD-1 purifying selection is weakened 
between the 55th and 75th positions (Section 3). 

The regions under impaired purifying selection were 
mapped on the 3D-structure of human GAPD-1 (PDB 
ID lu8f). Section 1 corresponds to a buried P -strand 
and adjacent loops near the NAD-binding site. Sections 
2 and 3 are solvent-exposed regions of the polypeptide 
chain also composed of both P -strands and loops. 

Selective pressure affecting GAPD family members 
was also investigated by means of branch-specific mod- 
els as implemented in PAML. Six datasets were exam- 
ined: mammalian GAPD-1 (17 sequences) and GAPD-2 
(12 sequences), teleost GAPD-1 (10 sequences), GAPD- 
2 (7 sequences) and GAPD-3 (4 sequences) as well as 
insect GAPD (8 sequences). To determine whether the 
selective constrains vary for different isoenzymes and 
lineages, two models were compared: one-ratio (Rl) and 
six-ratios (R6). Rl assumed constant Ka/Ks ratio for all 
examined GAPD datasets, whereas R6 assumed different 
ratios for each dataset. The obtained Ka/Ks ratios and 
the likelihoods of the models are represented in Table 2. 
The likelihood ratio test (LRT) indicated a significant 
difference between the likelihoods of Rl and R6 (2d = 
148.57, df = 5, p-value = 0.00), implying variation of 
selective constrains at least for some datasets. 
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Figure 4 Synteny maps. Syntenic comparison of GAPD genes among human {Homo sapiens), stickleback (Gasterosteus aculeatus), zebrafish 
(Danio rerio), lancelet (Branchiostoma fioridae) and sea squirt (Ciona savignyi). A) The locus containing GAPD-1 of human, GAPD-1 of stickleback, 
GAPD-3 of stickleback, the only GAPD isoenzyme of lancelet and one of GAPD isoenzymes of sea squirt. B) The locus containing GAPD-2 of 
human, GAPD-2 of stickleback and GAPD-2 of zebrafish. GAPD genes are shown by ovals, other genes - by rectangles. Homologues are indicated 
by discontinuous lines. The numbers near the yellow axes mean either the quantities of genes which are not shown or the distances in 
kilobases. 



Following the results obtained for R6 model, the Ka/Ks 
ratios of mammalian GAPD-2 and insect GAPD differ 
from the mean value above all (Figure 6). Therefore the 
hypotheses stating that the selective constrains differ 
between these two and the other datasets were tested. 
Three models were compared with R6: R2m model 
assuming constant Ka/Ks ratio for all datasets except 



mammalian GAPD-2, R2i model assuming constant KJ 
Ks ratio for all datasets except insect GAPD and R3 
model assuming constant Ka/Ks ratio for all datasets 
except both mammalian GAPD-2 and insect GAPD (see 
Table 2 for the obtained co-values and likelihoods). LRT 
revealed that the likelihoods of R3 and R6 are not signif- 
icantly different (2d = 6.73, df = 3, p-value = 0.08), while 



Table 1 Pairs of sequences used for K-JK^ calculation 



Taxon 


Species 


Isoenzyme 


Sequence identity, % 


Mammals 


Homo sapiens - Microcebus murinus 


GAPD-1 


91 




Homo sapiens - Cailitrix jacchus 


GAPD-2 


94 


Teleosts 


Tetraodon nigroviridis - Tal<ifugu rubripes 


GAPD-1 


94 




Tetraodon nigroviridis - Taldfugu rubripes 


GAPD-2 


92 


n sects 


Drosopinila ananassae - Drosopliiia virilis 


GAPD 


87 
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Position 



Figure 5 KJK^ profiles. Five GAPD isoenzymes are compared: 
GAPD-1 and GAPD-2 of mammals, GAPD-1 and GAPD-2 of teleosts 
and GAPD of insects. Numbering of the X axis starts immediately 
after the proline-rich domain of mammalian GAPD-2 and 
corresponds to the positions in protein sequences. 



the likelihoods of both R2m and R2i are significantly 
lower (2d = 64.84, df = 4, p-value = 0 and 2d = 60.5, df 
= 4, p-value = 0, respectively). It means that the selec- 
tive constrains are more or less similar for all three tele- 
ost GAPD isoenzymes and mammalian GAPD-1, greater 
for insect GAPD and weaker for mammalian GAPD-2. 

Discussion 

Evolutionary relationships between GAPD isoenzymes 

In this study we sought to expand the previous phyloge- 
netic investigations of GAPD [42-50] by concentrating 
on deuterostomes. As compared to the study in refer- 
ence [42], which is also focused on deuterostomes, we 
introduced a number of new sequences especially from 
non-mammalian and non-teleost species and carried out 
the syntenic analysis. This allowed more accurate 



Table 2 The KJK^ ratio estimates for GAPD isoenzymes 
under various branch-specific models 



IVIodel 


Ka/Ks ratio 


Log-likelihood 


Rl 


0.06369 


-2222141 


R2m 


0.05363 (all except mammalian GAPD-2) 
0.12179 (mammalian GAPD-2) 


-22179.54 


R2i 


0.07405 (all except insect GAPD) 
0.02642 (insect GAPD) 


-22177.37 


R3 


0.06263 (all except listed below) 
0.12186 (mammalian GAPD-2) 
0.02567 (insect GAPD) 


-22150.49 


R6 


0.06672 (mammalian GAPD-1) 
0.05218 (fish GAPD-1) 
0.07492 (fish GAPD-3) 
0.12187 (mammalian GAPD-2) 
0.06218 (fish GAPD-2) 
0.02593 (insect GAPD) 


-22147.12 
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Figure 6 KJK^ values obtained with the aid of branch-specific 
models. The shown KJK^ values were calculated using the six-ratio 
(R6) model, which implies different selection constrains for GAPD-1 
and GAPD-2 of mammals, GAPD-1, GAPD-2 and GAPD-3 of teleosts, 
as well as GAPD of insects. The values for all isoenzymes except 
mammalian GAPD-2 and insect GAPD were found not to differ 
significantly. Discontinuous horizontal line is the Kg/Kj value 
obtained using the one-ratio (Rl) model, which implies the same 
selection constrains for all isoenzymes. 

determination of phylogeny, as well as the identification 
of some novel GAPD isoenzymes, for example the third 
isoenzyme of teleosts. 

The constructed phylogenetic trees provide evidence 
for duplication in the early evolution of chordates which 
gave rise to GAPD-1 and GAPD-2 isoenzymes. It pre- 
sumably took place even before the first whole-genome 
duplication of vertebrates [51-53]. The loci of GAPD-1 
and GAPD-2 were found not to be syntenic to each 
other. It can be explained either by a single-gene dupli- 
cation, which produced a copy of the ancestral GAPD 
gene, or by loss of synteny after a duplication of longer 
genome segment. However, the emergence of GAPD-1 
and GAPD-2 is surely not a result of a retroposition, as 
it was concluded in early studies [54,55], documented 
by similar exon structures of the isoenzymes (Figure 7). 
It should be noted that GAPD is one of the few glycoly- 
tic enzymes that did not acquire any additional isoen- 
zymes during the vertebrate-specific whole-genome 
duplication events; neither did phosphoglucose isomer- 
ase, triosephosphate isomerase and phosphoglycerate 
kinase. The other glycolytic enzymes gained from one to 
three extra copies that evolved to the tissue-specific pro- 
teins [42,56-61]. 

GAPD-2 was lost in most lineages and retained only 
by mammals, lizards, teleosts and cartilaginous fishes. 
The presence of both isoenzymes in these organisms 
raises the question of a functional difference between 
them. It is assumed that if two isoenzymes perform the 
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f \ 

GAPD-l 0 0 

GAPD-2 1 MSKRDIVLTNVTVVQLLRQPCPVfrRAPPPPEPKAEVEPQPQPEPTPVREE 50 

GAPD-l 1 MGKVKVGVNcJfGRIGRLVTRAAFNSGKV 28 

GAPD-2 51 IKPPPPPLPPHPATPPPKMVSVARELTVGINqFGRIGRLVLRACMEK-GV 99 

GAPD-l 29 DIVAINDPFlDLNY^t/YMFQYDSTHGKFHGTVKAENGKLVINGNPlTIFQ 78 

GAPD-2 100 KVVAVNDPF1DPEYNJ7YMFKYDSTHGRYKGSVEFRNGQLWDNHE1SVYQ 149 

GAPD-l 79 EfcoPSKIKWGDAGAEYVVESTGVFTTMEKAGkHLQGGAKRVllSAPSADA 128 

GAPD-2 150 Cf^EPKQlPWRAVGSPYWESTGVYLSIQAASpHISAGAQRVVlSAPSPDA 199 

GAPD-l 129 PMFVMGVNHEKYDN-SLKIIskASCTTNCLAPLAKVIHDNFGlVEGLt^IT 177 

GAPD-2 200 PMFVMGVNENDYNPGSMNIVEfJASCTTNCLAPLAKVIHERFGlVEGL^jrT 249 

GAPD-l 178 VHAITATQKTVDGPSGKLWRDGRGALQNIIPASTGAAKAVGKVIPELNGK 227 

GAPD-2 250 VHSYTATQKTVDGPSRKAWRDGRGAHQNIIPASTGAAKAVTKVIPELKC^ 299 

GAPD-l 228 LTGMAFRVPTANVSWDLTCRLEKPAKYDDIKKWKQASEGPLKGILGYT 277 

GAPD-2 300 LTGMAFRVPTPDVSWDLTCRLAQPAPYSAIKEAVKAAAKGPMAGILAYT 349 

GAPD-l 278 EHQVVSSDFNSDTHSSTFDAGAGlALNDHFVKLIStJfDNEFGYSNRWDL 327 

GAPD-2 350 EDEj/VSTDFLGDTHSSlFDAKAGlALNDNFVKLISl^fDNEYGYSHRVVDL 399 

GAPD-l 328 MAHMASKE- 335 
GAPD-2 400 LRYMFSRDK 408 

Figure 7 Exon structures of human GAPD-1 and GAPD-2. The 

boundaries of exons are shown by vertical lines. The figure is based 
on EnsembI data [96]. 



same function in the same set of tissues, one of them is 
free from functional constraints and its gene will even- 
tually turn into a non-functional pseudogene or will be 
deleted [62-64]. In mammals and lizards GAPD-l and 
GAPD-2 specialized to tissue-specific proteins and this 
is probably the reason why one of them avoided the 
lost. Generally, specialization towards tissue-specificity is 
a trend among glycolytic enzymes that have acquired 
additional copies. In vertebrates, they usually have dis- 
tinctive isoenzymes in liver, muscle and brain, some- 
times in erythrocytes and other tissues [42,60]. The 
situation with GAPD of teleosts and cartilaginous fishes 
is more complex. According to EST data, GAPD-l and 
GAPD-2 of fishes are expressed in the same tissues. The 
results of branch-specific tests indicate that the evolu- 
tionary rates of both isoenzymes are accelerated as com- 
pared to the ancestral GAPD (GAPD of insects, which 
separated before the emergence of GAPD-l and GAPD- 
2, was considered to evolve with the similar rate as the 
ancestral protein). This is in line with the model of gene 
duplications proposed by Hughes [65,66]. It suggests 
that the original gene was performing two or more func- 
tions. After duplication each copy specialized on per- 
forming a part of them. GAPD is known to be a 
multifunctional protein participating in many processes 
beyond glycolysis. As the catalytic center is conserved in 
both isoenzymes, GAPD-l and GAPD-2 of teleosts and 
cartilaginous fishes may specialize on performing differ- 
ent non-glycolytic functions, as also evidenced by KJKs 
profiling. Different regions of teleost GAPD-l and 
GAPD-2 are under impaired purifying selection. These 
regions can correspond to the parts of proteins which 
are responsible for performing isoenzyme-specific non- 
glycolytic functions. 



A number of additional duplications of GAPD genes 
occurred independently in certain lineages. For example, 
some teleosts possess the third GAPD isoenzyme 
(GAPD-3) in addition to GAPD-l and GAPD-2. Taking 
into account both the constructed phylogenetic trees 
and the obtained data on syntenies, it can be concluded 
that GAPD-3 originated from GAPD-l during the tele- 
ost-specific whole-genome duplication [67]. However, 
GAPD-3 was not found in complete genomes of zebra- 
fish, tetraodon and fugu, which means that it was lost. 

The retention of GAPD-3 by certain species of teleosts 
agrees with the model of dosage balance proposed by 
Papp with colleagues [65,68]. It states that genes having 
optimal dosages that are dependent on each other may 
be lost only synchronously after whole-genome duplica- 
tions. Therefore they are preferentially kept. In the 
study by [42] most of the other glycolytic enzymes were 
shown to have extra copies in teleosts, which also origi- 
nated during the whole-genome duplication. Therefore, 
GAPD-3 as well as the other additional glycolytic isoen- 
zymes in teleosts may be retained to prevent dosage 
imbalance leading to glycolysis malfunction. 

The model of dosage balance also provides an expla- 
nation for Xenopus laevis possessing three slightly dif- 
ferent GAPD isoenzymes [Swiss-Prot:P51469, GenBank: 
BC043972, GenBank:BC048770]. Following the results of 
phylogenetic analysis (Figure 3), the duplications of 
GAPD-l gene giving rise to these isoenzymes seem to 
have taken place after the divergence of Xenopus and 
Rana genera of frogs. X. laevis is known to have under- 
gone a whole-genome duplication event about 40 mil- 
lion years ago [69,70] and most of its genes have two 
copies [71]. Furthermore, the GAPD genes in reference 
[GenBank:BC043972, GenBank:BC048770] might be the 
allelic variants of a single gene since they are 99% iden- 
tical and their evidence is only at transcript level. If so, 
X. laevis would have only two GAPD genes, in line with 
the dosage balance model. 

Sea squirt Ciona savignyi was discovered to possess 
three different GAPD isoenzymes as well. All of them 
seem to have originated from GAPD-l after the emer- 
gence of tunicates (Figure 3). To check whether these iso- 
enzymes are encoded by distinct genes or allelic variants 
of a single gene, we turned to the C. savignyi genome 
assembly version 2.0 (Broad Institute) with removed 
redundant alleles [72] available via Ensembl. There were 
only two GAPD genes [Ensembl:ENSCSAVG00000004357, 
Ensembl:ENSCSAVG00000007442] corresponding to the 
two isoenzymes. The remaining isoenzyme was 97% iden- 
tical to one of the others. Perhaps, it is nothing but an alle- 
lic variant since C. savignyi displays extremely high allelic 
polymorphism [73]. 

It looks like the duplication giving rise to the GAPD-l 
copies in C. savignyi is advantageous itself by increasing 
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GAPD dosage. Otherwise fixation of even two consecu- 
tive duplications seems to be unlikely. The model of 
gene duplications assuming beneficial increase in gene 
dosage has been extensively studied and shown to be 
applicable in a number of cases [74-76]. The duplica- 
tions of GAPD in the considered species may be 
explained by the emerged necessity of enhancing of 
some non-glycolytic functions of GAPD, as it is hard to 
imagine that such a conserved process as glycolysis 
needs an increase of a dose of one of its enzymes. 

GAPD-2 specialization to a testis-specific protein 

Mammalian GAPD-2 is known to be a highly specialized 
isoenzyme, which is present solely in testis (microarray 
data are available in the ArrayExpress database at http:// 
www.ebi.ac.uk/arrayexpress under accession numbers E- 
GEOD-7307, E-GEOD-3526, E-TABM-969 and E-GEOD- 
2361) [77-79]. We have found that GAPD-2 is expressed 
in a testis-specific way also by two lizard species. Lizards 
are also the only lineage besides mammals in which 
GAPD-2 possesses the proline-rich domain. Taking into 
account that this domain serves as an anchor to spermatic 
filament cytoskeleton, correlation between its presence 
and testis-specific expression seems to be evident. 

As GAPD-2 is a testis-specific protein only in mam- 
mals and lizards, it is likely to have specialized in this 
way during the early evolution of amniots. However, 
birds have completely lost GAPD-2. We could not 
detect it in any of the examined bird species including 
Gallus gallus and Taeniopygia guttata with complete 
genomes. So, the same GAPD isoenzyme should act in 
both somatic tissues and testis. It remains unclear what 
changes in bird spermatozoa rendered testis-specific 
GAPD-2 unnecessary. 

GAPD-2 is not the only testis-specific glycolytic isoen- 
zyme. There are also testis-specific isoenzymes of phos- 
phoglycerate kinase (PGK-2) [80,81] and lactate 
dehydrogenase (LDHC) [82-84]. It is remarkable that 
both are possessed only by mammals, thus resembling 
GAPD-2. PGK-2 originated from PGK-1 isoenzyme by 
retrotransposition [85,86], while LDHC stems from the 
LDHA isoenzyme [60]. These events are supposed to 
have taken take place during the early evolution of 
mammals. Perhaps, the gain of three testis-specific gly- 
colytic isoenzymes is a consequence of an alteration of 
spermatozoa structure. Mammalian spermatozoa are 
known to have a relatively long and thin tail, complicat- 
ing ATP diffusion from mitochondria along it [87]. 
Therefore, energy is generated mostly by glycolytic 
enzymes located in the tail cytoplasm [34,88]. Such reor- 
ganization of metabolism may require special isoen- 
zymes with distinctive catalytic properties. 



As mentioned before, a unique feature of testis-speci- 
fic GAPD-2 is the additional N-terminal proline-rich 
domain, which is absent in all other GAPD isoenzymes. 
Moreover, there are no additional fragments in PGK-2 
and LDHC. The spatial structure of the proline-rich 
domain is still unsolved. We have found that for the 
majority of mammals it is encoded by two exons. The 
first exon encodes a conservative segment of 22 amino 
acids. The second exon encodes a segment with a high 
content of proline residues, highly variable in both 
length (58-97 amino acids) and composition (see addi- 
tional file 2: The proline-rich N-terminal domains of 
mammalian GAPD-2). The layout of proline residues 
has a strikingly repetitive character. They form Pn and 
(XP)n motifs, where X is any amino acid (often cysteine, 
glutamic acid or glutamine). Generally, polyproline 
repetitive motifs are known to participate in strong but 
unspecific protein-protein interactions [89]. Apparently 
they play the same role in the proline-rich domain of 
GAPD-2 mediating the binding to spermatic filament 
cytoskeleton. The presence of two different kinds of 
polyproline motifs suggests GAPD-2 being bound to 
more than one protein of cytoskeleton. 

An evidence for unspecific proline-rich domain recog- 
nition by cytoskeletal proteins is also furnished by the 
results of Ka/Ks calculation. Kg/Ks value estimated for 
the variable segment of the proline-rich domain of 
mammalian GAPD-2 was close to unity, which means 
that this domain is subjected to neither purifying nor 
positive selection and therefore its specific sequence is 
not important for functioning. 

The proline-rich domain is likely to be relatively 
young since it is absent in all other GAPD isoenzymes 
and no similar sequences have been revealed in other 
proteins by means of BLAST searches. So-called exoni- 
zation of non-coding sequences is now assumed to be 
the source of new protein domains [90-93]. The repeti- 
tive character of the proline-rich domain sequence 
implies that it could have emerged from a microsatellite 
region. This way of new domain origination was pro- 
posed to be a general mechanism for the repetitive pro- 
tein sequences [90,94,95]. 

Tissue-specific alternative splicing was discovered to 
govern the presence of proline-rich domain in GAPD-2 
of a lizard Anolis carolinensis: it depends on a cassette 
exon being either spliced or retained. Unfortunately, no 
conclusion can be made as to whether this mechanism 
preceded GAPD-2 specialization to a testis-specific pro- 
tein or appeared after it. It may be that GAPD-2 first 
incorporated the proline-rich domain as a rare optional 
splice variant in some tissues and only then specialized 
towards testis-specificity. 



Kuravsky ef al. BMC Evolutionary Biology 201 1, 11:160 
http://www.biomedcentral.eom/1 471 -2 1 48/1 1 /1 60 



Page 11 of 1 5 



Conclusions 

The results of our study substantially expand the cur- 
rent knowledge on evolution of GAPD family members. 
We show that GAPD-1 and GAPD-2 isoenzymes of 
mammals are also present in other lineages. We specu- 
late that they emerged after duplication of the ancestral 
GAPD gene during the early evolution of chordates. 
GAPD-1 then underwent a number of additional inde- 
pendent duplications in different species, while GAPD-2 
was lost in most lineages and is now found only in 
mammals and lizards, as well as cartilaginous and bony 
fishes. 

We have demonstrated that GAPD-2 of mammals and 
lizards is specialized to a testis-specific protein. Accord- 
ingly, in these lineages GAPD-2 has acquired the novel 
N-terminal proline-rich domain anchoring the protein 
to the sperm tail cytoskeleton. This domain is likely to 
have originated by exonization of a microsatellite geno- 
mic region in a common ancestor of amniots. Estimates 
of selective pressure suggest unspecific recognition of 
the proUne-rich domain by cytoskeletal proteins. Besides 
testis, GAPD-2 of lizards was also found in some regen- 
erating tissues, but lacking the proline-rich domain due 
to tissue-specific alternative splicing. 

Methods 

Sequence data 

In the previous study [42], GAPD-2 was shown to be 
specific for vertebrates. Therefore we decided to limit 
the consideration of GAPD isoenzymes and focused 
only on those belonging to vertebrates and also to the 
other groups of deuterostomes since they were not 
examined in [42]. In order to find all GAPD sequences 
of deuterostomes, we first turned to the Ensembl data- 
base [96]. 69 sequences of mammals and bony fishes 
were obtained from it as belonging to glyceraldehyde-3- 
phosphate dehydrogenase protein family [Ensembl: 
ENSFM00250000000211]. Second, a PSI-BLAST [97] 
search using the human GAPD-1 [SwissProt:P04406] as 
query (which was selected to be a typical example of 
GAPD) was conducted against UniProt [98]. Since 
GAPD is known to be a well-conserved protein, a strict 
e-value threshold of 10' was chosen. The search con- 
verged in 6 steps returning 8957 hits, all of which 
showed more than 30% of identity to the query 
sequence. All in all 60 sequences of deuterostomes were 
picked out (excluding fragments and those previously 
obtained from the Ensembl database). We also selec- 
tively picked out 13 sequences of the major protostome 
phyla (arthropods, moUusks, annelid worms, round- 
worms and flatworms). Third, additional 55 sequences 
were obtained by employing TBLASTN algorithm with 
default parameters [97] to search with human GAPD-1 



[SwissProt:P04406] and GAPD-2 [SwissProt:014556] as 
queries in the EST division of GenBank [99]. EST hits, 
which usually represent fragments of complete mRNAs, 
were manually scanned for extensive overlapping regions 
and then joined into larger sequences. Further inspec- 
tion revealed some cases of contamination, which were 
excluded from the analysis. Specifically, we identified a 
chicken EST [GenBank:AM067846] actually belonging 
to Aspergillus flavus and three lancelet ESTs [GenBank: 
FE567488, GenBank:FE567489, GenBank:BW781185] 
belonging to some diatoms. As a result of this three 
step procedure the total of 197 GAPD sequences were 
identified for 131 species (109 deuterostomes and 22 
other animals, see additional file 3: Accession codes of 
GAPD sequences used in the analysis). 

Multiple alignment and phytogeny reconstruction 

Since phylogenetic tree reconstruction is a computation- 
ally expensive process, only a part of the obtained 
sequences was subjected to the analysis. No more than 7 
species from each class of deuterostomes were considered, 
as well as 6 species of insects as the representatives of pro- 
tostomes (for more details see additional file 3: Accession 
codes of GAPD sequences used in the analysis). Two 
slightly different GAPD sequences from the flatworm 
Macrostomum lignano, both derived from several ESTs 
[GenBank:EG952499, GenBank:EG951174, GenBank: 
EG952414, GenBank:EG952720, GenBank:EG953822], 
were used as an outgroup. The total dataset for phyloge- 
netic analysis comprised 92 GAPD sequences. Multiple 
alignment of protein sequences was performed by MUS- 
CLE [100] and then manually edited. The alignment of 
nucleic sequences was constructed by means of RevTrans 
1.4 Server [101] based on the protein alignment (see addi- 
tional file 4: Raw alignment of GAPD nucleic sequences 
used in the phylogenetic analysis). Columns with gaps 
were eliminated before phylogenetic analysis. 

The phylogenetic relationships between GAPD family 
members were reconstructed using both protein and 
nucleic sequences. The Bayesian method of tree recon- 
struction as implemented in MrBayes 3.1.2 [102,103] 
software was applied. The JTT model of amino-acid 
change [104], as well as the GTR model of nucleotide 
substitutions [105] were used. Preliminary analyses indi- 
cated that variation at the third position was saturated 
and confounded resolution at deep internal nodes. 
Therefore, trees based on nucleotide data were recon- 
structed in MrBayes by partitioning the data into the 
first, second and third codon positions, and allowing 
each partition to evolve at its own rate with its own 
shape parameter of gamma distribution. 

For the Bayesian analyses, two independent runs were 
performed, each with four simultaneous chains that 
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sampled every 100 generation. Trees sampled before the 
cold chain reached stationarity based on plots of the 
maximum likelihood scores were discarded. Sampling 
continued until convergence was achieved based on the 
average standard deviation of the split frequencies as 
given in MrBayes. Node support was accessed as Baye- 
sian posterior probabilities. 

Syntenies 

Syntenic analysis is a reliable approach for establishing 
orthology. It is based on the assumption that local sur- 
roundings of genes are rarely affected by genomic rear- 
rangements. Therefore, if the two genes have 
homologous neighbors, they are likely to have originated 
by vertical descent from a single ancestor and, in other 
words, be orthologous. 

A syntenic analysis of the relationship between GAPD 
family members was performed by identification of posi- 
tions of up to 20 genes both upstream and downstream 
of GAPD genes in human (Homo sapiens), stickleback 
{Gasterosteus aculeatus), zebrafish (Danio rerio), lancelet 
(Bmnchiostoma floridae), sea squirt {Ciena savignyi), sea 
urchin {Strongylocentrotus purpuratus) and acorn worm 
{Saccoglossus kowalevskii). Syntenic maps were con- 
structed based on the information regarding gene loca- 
tion either available from Ensembl (human, both fishes 
and sea squirt) or obtained by conducting BLASTX 
searches of adjacent genomic regions against non-redun- 
dant protein databases. In the latter case, the homology 
between genes was decided if the identities of their pro- 
tein product sequences were greater than 30%. The fol- 
lowing genomes were used: B. floridae version 2.0 (Joint 
Genome Institute) [51], S. purpuratus version 2.1 
(Human Genome Sequencing Center) [106] and S. 
kowalevskii version 1.0 (Human Genome Sequencing 
Center). Since genomic micro-rearrangements might 
occur, the matches between the local surroundings of 
GAPD genes were not required to be co-linear for 
establishing orthology. Gene losses and insertions were 
allowed as well. 

Synonymous and non-synonymous substitution rates 

To examine whether the GAPD family members are 
subjected to adaptive evolution, an analysis of variation 
under selective pressure was performed. Usually selec- 
tive pressure is estimated by comparing the rates of 
synonymous (Kg) and non-synonymous substitutions 
(Ka) for the entire sequence. If Ka/Ks value is greater 
than unity, the whole sequence is supposed to be under 
positive selection, otherwise under purifying selection 
[107-109]. However, since each amino acid has a differ- 
ent function, the type and strength of natural selection 



may be different for each amino acid. To detect the var- 
iation in Ka/Ks values across the sequence a sliding-win- 
dow approach is often used [110,111]. 

Alignments of nucleotide sequences were constructed 
by PAL2NAL [112] based on protein alignments. KJK^ 
profiles were generated using a window of 120 base 
pairs and a step of 20 base pairs. Such a wide window 
was used because of high conservation of the analyzed 
sequences. Calculations of Ka and Kj for each window 
position were carried out with the aid of DnaSP 5.10 
software [113]. 

Branch-specific selection tests 

The differences in selective pressure between GAPD iso- 
enzymes were also examined by means of more sophisti- 
cated branch-specific models as implemented in codeml 
program from PAML software [114]. Such kind of mod- 
els assumes separate Ka/Ks values for different branches 
of the phylogenetic tree. They are often used for detect- 
ing selection changes after gene duplications, where one 
copy might evolve at a different rate due to acquisition 
of a new function or the loss of an old one [115-118]. 

First, GAPD sequences were divided into groups 
according to the results of phylogenetic analysis. Then a 
number of branch-specific models assuming separate 
Ka/Ks ratios for different combinations of groups were 
assayed. Likelihood ratio test (LRT) was used to deter- 
mine whether the likelihoods of a pair of alternative 
branch-specific models are significantly different. 

Additional material 



Additional file 1: Alignment of the two forms of Anolis carolinensis 
GAPD-2 mRNA. The sequence of protein product is represented; the 
possible start codons are in bold. Translation of the full-length mRNA 
seems to begin from the first start codon and the protein product will 
possess the proline-rich domain (shown in italic). The shortened mRNA 
lacks the first start codon. Therefore translation should begin from the 
second one resulting in a protein product without the proline-rich 
domain. 

Additional file 2: The proline-rich N-terminal domains of 
mammalian GAPD-2 

Additional file 3: Accession codes of GAPD sequences used in the 
analysis 

Additional file 4: Raw alignment of GAPD nucleic sequences used in 
the phylogenetic analysis 
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